Skip to content

make some calculations 250-500 times faster#372

Merged
printfn merged 3 commits intoprintfn:mainfrom
Safari77:performance
Jan 1, 2026
Merged

make some calculations 250-500 times faster#372
printfn merged 3 commits intoprintfn:mainfrom
Safari77:performance

Conversation

@Safari77
Copy link
Copy Markdown
Contributor

@Safari77 Safari77 commented Dec 31, 2025

Before:
$ time ./fend 2^1555.555|b3sum
0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de -

real 1m42,334s
user 1m41,908s
sys 0m0,032s

Afer:
$ time ./fend 2^1555.555|b3sum
0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de -

real 0m0,401s
user 0m0,401s
sys 0m0,004s

And for the gcd:

Before:
$ time ./fend 0.9911^192|b3sum
315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e -

real    0m19,810s
user    0m19,757s
sys     0m0,005s

After:
$ time ./fend 0.9911^192|b3sum
315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e -

real    0m0,049s
user    0m0,040s

System information: rust 1.92.0, LLVM 20.1.8 x86_64-redhat-linux-gnu, Intel i5-13600K, opt-level=3, ~/.cargo/config.toml contents: [build]
rustflags = ["-C", "target-cpu=native"]

Functions changed:

add_assign_internal: Replaces high-level get/set overhead with efficient, vectorizable slice iteration using .zip() to eliminate bounds checks in the hot loop.

sub: Optimizes borrowing logic using u128 overflow detection and safely clamps loop ranges to handle unnormalized inputs without panicking.

sub_assign: Performs subtraction in-place on mutable buffers to eliminate expensive memory allocations during the Karatsuba recombination step.

mul: Dispatches to Karatsuba multiplication for inputs larger than 64 limbs (4096 bits) to reduce algorithmic complexity from O(N²) to O(N^¹·⁵⁸⁵).

mul_karatsuba_slice: Recursively computes Karatsuba products using &[u64] slices instead of cloning BigUint vectors, significantly reducing memory churn.

mul_internal_slice: Provides a highly optimized Schoolbook (O(N²)) multiplication base case that operates directly on slices for maximum cache efficiency.

div_rem_knuth (called by divmod): Replaces slow bit-by-bit binary division with Knuth's Algorithm D (base-2⁶⁴), reducing the number of division steps by a factor of 64.

root_n: Replaces Binary Search (linear convergence) with Newton's Method (quadratic convergence), reducing iteration count from thousands to dozens for large roots.

lshift / rshift: Moves interrupt checks outside the loop and simplifies carry logic to allow the compiler to generate efficient block memory moves.

add_assign_shifted: A specialized helper for Karatsuba that performs a "shift-and-add" operation in one pass without creating intermediate shifted values.

gcd: Replaces the division-heavy Euclidean algorithm with Stein's Algorithm (Binary GCD),
utilizing efficient bitwise shifts and subtraction to eliminate expensive modulo operations during
fraction simplification.

@Safari77 Safari77 marked this pull request as draft December 31, 2025 10:36
Before:
$ time ./fend 2^1555.555|b3sum
0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de  -

real	1m42,334s
user	1m41,908s
sys	0m0,032s

Afer:
$ time ./fend 2^1555.555|b3sum
0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de  -

real	0m0,401s
user	0m0,401s
sys	0m0,004s

And for the gcd:

Before:
$ time ./fend 0.9911^192|b3sum
315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e  -

real	0m19,810s
user	0m19,757s
sys	0m0,005s

After:
$ time ./fend 0.9911^192|b3sum
315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e  -

real	0m0,049s
user	0m0,040s

System information: rust 1.92.0, LLVM 20.1.8 x86_64-redhat-linux-gnu,
Intel i5-13600K, opt-level=3, ~/.cargo/config.toml contents:
[build]
rustflags = ["-C", "target-cpu=native"]

Functions changed:

add_assign_internal: Replaces high-level get/set overhead with
efficient, vectorizable slice iteration using .zip() to eliminate bounds
checks in the hot loop.

sub: Optimizes borrowing logic using u128 overflow detection and safely
clamps loop ranges to handle unnormalized inputs without panicking.

sub_assign: Performs subtraction in-place on mutable buffers to
eliminate expensive memory allocations during the Karatsuba
recombination step.

mul: Dispatches to Karatsuba multiplication for inputs larger than 64
limbs (4096 bits) to reduce algorithmic complexity from
O(N²) to O(N^¹·⁵⁸⁵).

mul_karatsuba_slice: Recursively computes Karatsuba products using
&[u64] slices instead of cloning BigUint vectors, significantly reducing
memory churn.

mul_internal_slice: Provides a highly optimized Schoolbook (O(N²))
multiplication base case that operates directly on slices for maximum
cache efficiency.

div_rem_knuth (called by divmod): Replaces slow bit-by-bit binary
division with Knuth's Algorithm D (base-2⁶⁴), reducing the number of
division steps by a factor of 64.

root_n: Replaces Binary Search (linear convergence) with Newton's Method
(quadratic convergence), reducing iteration count from thousands to
dozens for large roots.

lshift / rshift: Moves interrupt checks outside the loop and simplifies
carry logic to allow the compiler to generate efficient block memory
moves.

add_assign_shifted: A specialized helper for Karatsuba that performs a
"shift-and-add" operation in one pass without creating intermediate
shifted values.

gcd: Replaces the division-heavy Euclidean algorithm with Stein's
Algorithm (Binary GCD), utilizing efficient bitwise shifts and
subtraction to eliminate expensive modulo operations during fraction
simplification.
@Safari77 Safari77 marked this pull request as ready for review December 31, 2025 10:46
@Safari77 Safari77 changed the title make it 250 times faster make some calculations 250-500 times faster Dec 31, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 1, 2026

Codecov Report

❌ Patch coverage is 75.47974% with 115 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.17%. Comparing base (9636b16) to head (6a52ead).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
core/src/num/biguint.rs 75.47% 115 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #372      +/-   ##
==========================================
- Coverage   81.47%   81.17%   -0.30%     
==========================================
  Files          52       52              
  Lines       14717    15032     +315     
==========================================
+ Hits        11990    12202     +212     
- Misses       2727     2830     +103     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@printfn printfn merged commit 602508d into printfn:main Jan 1, 2026
7 checks passed
@printfn
Copy link
Copy Markdown
Owner

printfn commented Jan 1, 2026

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants